本文提出了一个开放而全面的框架,以系统地评估对自我监督单眼估计的最新贡献。这包括训练,骨干,建筑设计选择和损失功能。该领域的许多论文在建筑设计或损失配方中宣称新颖性。但是,简单地更新历史系统的骨干会导致25%的相对改善,从而使其胜过大多数现有系统。对该领域论文的系统评估并不直接。在以前的论文中比较类似于类似的需要,这意味着评估协议中的长期错误在现场无处不在。许多论文可能不仅针对特定数据集进行了优化,而且还针对数据和评估标准的错误。为了帮助该领域的未来研究,我们发布了模块化代码库,可以轻松评估针对校正的数据和评估标准的替代设计决策。我们重新实施,验证和重新评估16个最先进的贡献,并引入一个新的数据集(SYNS-Patches),其中包含各种自然和城市场景中的密集室外深度地图。这允许计算复杂区域(例如深度边界)的信息指标。
translated by 谷歌翻译
由于分布式概括是一个普遍不足的问题,因此在不同的研究计划中研究了各种代理目标(例如,校准,对抗性鲁棒性,算法腐败,跨轮班的不变性),导致不同的研究计划,从而提出不同的建议。在共享相同的抱负目标的同时,这些方法从未在相同的实验条件下对真实数据进行测试。在本文中,我们对以前的工作进行了统一的看法,突出了我们经验解决的消息差异,并提供有关如何衡量模型鲁棒性以及如何改进它的建议。为此,我们收集了172个公开可用的数据集对,用于培训和分布外评估准确性,校准错误,对抗性攻击,环境不变性和合成腐败。我们从九个不同的架构中的九个不同的架构中微调了31k网络。我们的发现证实,分布的精度往往会共同增加,但表明它们的关系在很大程度上取决于数据集依赖性,并且通常比以前较小的规模研究所提出的更加细微和更复杂。
translated by 谷歌翻译
尽管在没有标签的情况下,自我监督的学习使有效的表示学习能力,但视频仍然是相对未开发的监督来源。为了解决这个问题,我们提出了像素级对应(PICO),这是一种从视频中进行密集的对比度学习的方法。通过使用光流的跟踪点,我们获得了一个对应图,该图可用于匹配不同时间点的本地特征。我们在标准基准测试上验证了PICO,在多个密集的预测任务上表现优于自我监督基线,而不会损害图像分类的性能。
translated by 谷歌翻译
我们提出了一种新的方法,以综合新姿势的人民观点。我们的新颖可分解渲染器能够从任何观点来合成高度现实的图像。我们的渲染器而不是经过基于网格的结构,而不是经过网格的结构,而是利用直接代表人类的底层骨骼结构的弥漫性高斯基元。渲染这些原语可以通过解码器网络转换成RGB图像的高维潜像。制剂产生了可以训练端到端的完全可分辨率的框架。我们展示了我们对人类3.6M和Panoptic Studio数据集的图像重建方法的有效性。我们展示了我们的方法如何用于个人之间的运动转移;新颖的观看综合从单个相机捕​​获的个体;从任何虚拟角度扫描个体;并重新渲染新颖的姿势。代码和视频结果在https://github.com/guillaumerochette/humanviewsynthesis中获得。
translated by 谷歌翻译
我们提出了简单的主动采样和重新重量策略,以优化最小最大公平性,可以应用于通过损耗最小化学习的任何分类或回归模型。我们的方法背后的关键直觉是在每个TIMESTEP中使用来自当前模型中最差的组的DataPoint,以更新模型。实施的易于实现和我们稳健的制定的一般性使其成为提高糟糕表现群体的模型性能的有吸引力的选择。对于凸起的学习问题,如线性或逻辑回归,我们提供了对我们的策略的细粒度分析,证明了其收敛速度对Min-Max Fair解决方案。
translated by 谷歌翻译
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it is the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school. * Equal contribution. This work was done while JL was a Research Fellow at the Alan Turing Institute. 2 https://obamawhitehouse.archives.gov/blog/2016/05/04/big-risks-big-opportunities-intersection-big-dataand-civil-rights 31st Conference on Neural Information Processing Systems (NIPS 2017),
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1{\deg}. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base.
translated by 谷歌翻译
Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted approximation is followed such that the discount factor is increased slowly at a rate related to a decreasing learning rate, the resulting method recovers the standard guarantees of gradient ascent on the undiscounted objective.
translated by 谷歌翻译